首页> 外文OA文献 >Multimodal Joint Visual Attention Model for Natural Human-Robot Interaction in Domestic Environments
【2h】

Multimodal Joint Visual Attention Model for Natural Human-Robot Interaction in Domestic Environments

机译:家庭环境中自然人机交互的多模式联合视觉注意模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Due to population ageing, the cost of health care will raise in the coming years. One way to help humans, and especially elderly people, is the introduction of domestic robots that can assist people in daily life such that they are less dependent on home care. Joint visual attention models can be used for natural robot-human interaction. Joint visual attention is that two humans or a robot and a human have a shared attention to the same object. This can be accomplished by pointing, eye-gaze or by using speech. The goal of this thesis is to develop a non-verbal joint visual attention model for object detection that integrates gestures, gaze, saliency and depth. The question that will be answered in this report is: how can the information from gestures, gaze, saliency and depth be integrated in the most efficient way to determine the object of interest? Existing joint visual attention models only work when the human is in front of the robot, so that the human is in view of the camera. Our model should be more flexible than existing models, so it needs to work in different configurations of human, robot and object. Furthermore, the joint visual attention model should be able to determine the object of interest when the pointing direction or the gaze location is not available. The saliency algorithm of Itti et al. [1] has been used to create a bottom up saliency map. The second bottom-up cue, depth, is determined by means of segmenting the environment to extract the objects. Apart from the bottom-up cues, top-down cues can be used as well. The pointing finger is identified and based on the eigenvalues and eigenvectors of the finger the pointing direction will be retrieved. A pointing map is created by means of the angle between the 3D pointing direction vector and the 3D vector from the pointing finger to the object. A hybrid model, which computes a gaze map, has been developed that switches depending on textureness of the object between texture based approach and color based approach. Depending on the configuration of the human, robot and object, three or four maps are available to determine the object of interest. In some configurations, the pointing map or gaze map is not available. In that case the combined saliency map is obtained by point wise multiplication of these three maps. If all four maps are at our disposal, all maps are added and multiplied by the pointing mask. When the human and robot are opposite of each other and pointing, bottom up saliency and depth are combined, 93.3% of the objects are detected correctly. If the human is standing next to the robot, the gaze map, bottom up saliency map and depth map are combined, then the detection rate is 67.8%. If robot, human and object are standing in a triangular shape, the detection rate is equal to 96.3%. The main contribution is that the joint visual attention model is able to detect objects of interest in different configurations of human, robot and object and it also works when one of the four cues is not available. Furthermore, a hybrid model has been developed that is able to create a probability gaze map. Depending on the textureness of the object, the model chooses texture based approach or color based approach to generate this gaze map. The probability pointing map is generated using the 3D point cloud data instead of 2D information, which results in an accurate pointing map.
机译:由于人口老龄化,未来几年的医疗保健费用将增加。帮助人类尤其是老年人的一种方法是引入家用机器人,该机器人可以在人们的日常生活中提供帮助,从而减少了他们对家庭护理的依赖。联合视觉注意模型可用于自然的机器人与人的互动。共同的视觉注意力是两个人或一个机器人和一个人对同一物体具有共同的注意力。这可以通过指向,注视或使用语音来完成。本文的目的是开发一种结合手势,注视,显着性和深度的非语言联合视觉注意模型。该报告将回答的问题是:如何以最有效的方式整合来自手势,凝视,显着性和深度的信息,以确定感兴趣的对象?现有的联合视觉注意模型仅在人在机器人面前时才起作用,因此人在摄像机的视线范围内。我们的模型应该比现有模型更具灵活性,因此它需要在人,机器人和对象的不同配置下工作。此外,当指向方向或凝视位置不可用时,联合视觉注意模型应该能够确定感兴趣的对象。 Itti等人的显着性算法。 [1]已用于创建自下而上的显着性图。通过对环境进行分段以提取对象来确定第二个自下而上的提示深度。除了自下而上的提示外,还可以使用自上而下的提示。识别手指,并基于手指的特征值和特征向量,检索指向方向。借助于3D指向方向向量和3D向量之间的角度(从手指到对象)创建指向图。已经开发了一种计算凝视图的混合模型,该模型根据对象的纹理在基于纹理的方法和基于颜色的方法之间进行切换。根据人,机器人和物体的配置,可以使用三个或四个地图来确定感兴趣的物体。在某些配置中,指向图或注视图不可用。在那种情况下,通过这三个图的逐点相乘获得组合的显着图。如果我们可以使用所有四张地图,则将所有地图相加并乘以指向蒙版。当人类和机器人彼此相对并指向,指向下,从下到上的显着性和深度时,可以正确检测到93.3%的对象。如果人站在机器人旁边,则将凝视图,自下而上的显着图和深度图组合在一起,则检出率为67.8%。如果机器人,人和物体呈三角形站立,则检出率等于96.3%。主要的贡献在于,联合视觉注意力模型能够检测到人,机器人和物体的不同配置中的感兴趣物体,并且当四个提示之一不可用时,它也可以工作。此外,已经开发了能够创建概率注视图的混合模型。根据对象的纹理,模型选择基于纹理的方法或基于颜色的方法来生成此凝视图。使用3D点云数据而不是2D信息生成概率指向图,从而获得准确的指向图。

著录项

  • 作者

    Domhof, J.F.M. (author);

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号